1 Introduction

Border security is an essential aspect to the national welfare of any country, especially in modern times. In recent election in the United States, our borders with Canada and Mexico have been a ``hot topic” during discussions on immigration policy and national defense. Understanding the our border and the different types of crossings–from everyday commuters, economic activity, or tourism–is crucial to help our leaders make wise policy decisions and being a better global citizen.

During our exploration of US border crossings data, we investigated the differences between the Mexican and Canadian border, the temporal aspects of border crossings, how different ports compare to each other types of crossings, and the effects of major events such as September 11th and COVID-19. Our complete explorations of these data are found on GitHub at https://github.com/rlscott/borderxing.

1.1 Data

Our data are from the Bureau of Transportation Statistics, which is part of the US Department of Transportation. Our border crossing data was collected at ports of entry by US Customs and Border Protection (CBT). The data include a monthly count of entries into the United States, broken down in to categories of vehicles, containers, passengers or pedestrians. The data cover the years 1996 to 2023, beginning in April 1996 and ending in September 2023.

The data were found on this Bureau of Transportation Statistics website: https://data.bts.gov/stories/s/jswi-2e7b.

The data includes 386,549 entries, where each entry represents a monthly crossing count of a particular category at a port of entry. The columns in our data set include the port name, state, port code, border (US-Mexico or US-Canada), date, measure, value, latitude, and longitude. There are 12 different types of measures or categories in our data: bus passengers, buses, pedestrians, personal vehicle passengers, personal vehicles, rail containers empty, rail containers loaded, train passengers, trains, truck containers empty, truck containers loaded, and trucks. The value column contains the monthly count of that paticular measure.

We made a few modifications to our data to make it more tidy. Fortunately our data was already in long-format, which made analysis much easier. First, we used the lubridate package to create month and year columns in our data (instead of only a month-year date column). We then further cleaned our data wtih tools in the dplyr package. We set the port name, state, port code, border, month, and measure columns to be factors. We created an indicator column, called type, with two levels: object and people to better distinguish between different types of traffic. We found this necessary because merely taking the sum of all crossings without taking into account the different types of measures would result in some double-counting and sort of weighted aggregate. This is because a personal vehicle with three passengers would be counted as a total of four crossings, whereas three pedestrians would only count as three total crossings when taking the total crossings at that port. Differentiating between people and object solves this double-counting problem in our analysis.

Additionally, we also cleaned our data by distinguishing between two ports on the US-Canada border with the same name, Eastport. We noticed that while there are 118 unique port codes, there were only 117 unique port names. Further investigation revealed an Eastport in Maine and another in Idaho. In our data, we renamed these two ports to be ‘Eastport ME’ and ‘Eastport ID’.

All of these changes while cleaning our border data resulted in the following structure:

2 Curiosity

2.1 General exporation on all ports of entry

With the dplyr package, we found 118 unique ports of entry in our data. 90 of these ports are on the US-Canada border, and 28 of these ports are on the US-Mexico Border.

Figure 2.1 shows the location of each border crossing station mapped by latitude and longitude. This interactive map was created using the leaflet package. Ankoridge is not included in this map since its coordinates are missing from the data (more about this port will be discussed later).

Figure 2.1: Map of the 118 US Border Ports of Entry

As shown in Figure 2.2, North Dakota and Washington have the most ports of entry for the US-Canadian border and Texas has the most ports of entry for the US-Mexico border. There are 10 states on the US-Canada border and 4 states on the US-Mexico border.

Count of Ports of Entry by State, Colored by Border

Figure 2.2: Count of Ports of Entry by State, Colored by Border

We also investigated the multinational distributions of measure counts for each port. This was done by finding the two median values for all twelve measures for ports on the US-Canada border and ports on the US-Mexico border. We computed the proportion of the median measure counts, thus creating a multinational distribution with 12 outcomes. Next, we found the multinational distribution for each of the 118 ports of entry. To find ports that significantly deviate in measure proportions from the median ports (for their respective border), we used a \(\chi^2\) goodness of fit test with Monte Carlo simulation. Proportions were used instead of the actual counts was because some ports have significantly more traffic than other. Since the \(\chi^2\) test statistic is a summation involving the observed and expected values, ports with more traffic would naturally have a higher test statistic than ports with low traffic that significantly differ from the median distribution. Including Monte Carlo simulation helped improve the accuracy of our test, since percentages are extreamly small values. Due to the approximations, we decided to ignore p-values and instead rank the ports by test statistic, resulting in Figure 2.3.

Multinomail distribtions of each port, ranked by test statistic

Figure 2.3: Multinomail distribtions of each port, ranked by test statistic

Figure 2.3 shocases the multinational distributions for each port, and they are arranged by having the port with the largest test statistic at the top. Labels were removed for some of the ports to prevent over-plotting. The Canada (1) pane contains the ports with the highest test statistic on the US-Canada border. Interestingly, pedestrians, personal vehicles, and personal vehicle passengers all have the highest overal counts for most of the ports. For the US-Mexico border, Cross Border Xpress and Boquillas have the greatest deviation from the median distribution, with a large majority of crossings being pedestrians. Upon further investigation, Cross Border Xpress is a pedestrian bridge between two airports in San Diego and Tijuana. The distribution of El Passo is closest to the US-Mexico border median distribution. For the US-Canada border, the ports Anchorage and Skagway have the greatest deviation from the US-Canada border median distribution, with Highgate Springs having the least deviation. Also, ports on the US-Mexico border have more pedestrians than ports on the US-Canada border, which can be plausably explained by climate differences.

2.2 US-Canada ports of entry

We have identified outliers in the scatter plot related to Buses Transportation in Figure 2.4. Notably, in November 1999, Port Sweetgrass recorded a significantly higher count of 1375 instances of bus transportation. Conversely, in June 2021, Port Limestone reported a comparatively lower count of 173 instances of bus transportation.

Numbers of Crossing of Object by State

Figure 2.4: Numbers of Crossing of Object by State

We conducted an analysis of the total object transportation for the year 1999 in Montana and the year 2021 in Maine. Upon pinpointing specific months within each year in Figure 2.5, it becomes evident that these data points do not align with the peak of the plot. This observation suggests that bus transportation may serve as an alternative choice, particularly when other modes of transportation prove impractical for individuals crossing the border.

Plot of Transporation_via_object by Month

Figure 2.5: Plot of Transporation_via_object by Month

In order to validate our hypothesis, we conducted an examination of Bus Passengers transportation across states, revealing six outliers in Figure 2.6. These outliers are as follows:

  • 46712 Bus Passengers in November 1998 in Maine at Port Jackman
  • 159086 Bus Passengers in October 159086 in Washington at Port Blaine
  • 6026 Bus Passengers in June 1997 in Minnesota at Port Noyes
  • 2203 Bus Passengers in May 2000 in Idaho at Eastport ID
  • 135753 Bus Passengers in December 1997 in Michigan at Detroit
  • 19322 Bus Passengers in May 2019 in Vermont at Highgate Springs

These instances, deviating significantly from the norm, warrant further investigation and may provide insights into the dynamics of bus transportation across state borders.

Numbers of Crossing of People by State

Figure 2.6: Numbers of Crossing of People by State

Upon isolating the specific months within the annual plot of total transportation via people in Figure 2.7, we observed that the identified outliers in both Buses and Bus Passengers transportation did not coincide with the peak of the plot. This outcome substantiates our initial assumption that bus transportation serves as an alternative choice, particularly when conventional transportation methods prove impractical for individuals crossing the border. Further analysis indicates that these outliers are not attributed to significant events, reinforcing the notion that the observed deviations in Buses/Bus Passengers transportation are likely driven by individual travel choices rather than external factors.

Plot of Transporation via People by Month

Figure 2.7: Plot of Transporation via People by Month

Subsequently, we aggregated data into seven-year intervals, examining the average transportation volumes for both objects and people in Figure 2.8. Our analysis revealed a pronounced seasonal trend, with peak transportation occurring during the summer and a corresponding underestimation during the winter months.

Additionally, we observed that between 1996 and 2002, the average traffic for both objects and people reached its zenith. Over the subsequent 14 years, a discernible translational decline was noted, followed by a resurgence in traffic from 2017 to 2023.

Plot of Average Crossing in each 7 years

Figure 2.8: Plot of Average Crossing in each 7 years

2.3 US-Mexico ports of entry

Sum of people crossing the US and Cannadian border over time. The red line indicates 9/11, gold indicates when Donanld Trump was elected, and purple indicates the start of Covid 19.

Figure 2.9: Sum of people crossing the US and Cannadian border over time. The red line indicates 9/11, gold indicates when Donanld Trump was elected, and purple indicates the start of Covid 19.

Figure 2.9 displays the aggregate pattern from January 1996 to September 2023 for both the Mexican and Canadian border. An important note to remember is that this is the total number of individuals that have crossed the border as opposed to objects. The red lines are potentially significant historical events that may explain certain patterns in the graph.

We can see easily that Covid 19 had a large impact on both Canadian and Mexican border crossings. This is unsurprising since many institutions shut down and border crossing was severely limited. The Presidency of Donald Trump in contrast seemed to have very little if any effect on either border, despite his heavy involvement with the Mexican Border. 9/11 however seemed to have a much more dynamic effect. It has an obvious lasting reduction in Mexican border crossings, yet Canada had nearly now effect from this event. This could be because of any number of reasons, xenophobia not last on the list, however discerning that cause is not possible with the current data.

The story of 9/11 gets even more interesting though on the Mexican border. Two of the largest ports, Calexico in California and El Paso in Texas have a particularly distinguished change in total crossings at the 9/11 timestamp seen in Figure 2.10.
Sum of people crossing the Mexican border ports Calexico and El Paso over time. The red line indicates 9/11

Figure 2.10: Sum of people crossing the Mexican border ports Calexico and El Paso over time. The red line indicates 9/11

Sum of people crossing the US and Cannadian border over time. The red line indicates 9/11

Figure 2.11: Sum of people crossing the US and Cannadian border over time. The red line indicates 9/11

Since these ports are so large, it is reasonable to believe they exert a lot of control over the overall pattern in Figure 2.9. Thus we see a significantly reduced effect of 9/11 when taking out these two ports shown by the blue line in Figure 2.11. In fact the lasting reduction in border crossings has been almost entirely eliminated, and all that remains is a short dip. This dip seems to be a result of a few different ports, and not as easily explained away. One large example of a port having a short dip is port San Ysidro in California. Just a few ports explain the majority of the difference between pre 9/11 and post 9/11 border crossings implies that an investigation into those two ports could be the most effective way of determining the reason why 9/11 had such a large effect on the Mexican border while having small effect on the Canadian one.

It is also worth noting that there are other smaller ports that exhibit the same phenomenon as El Paso and Calexico. However they are so small that their addition to the list of removed ports barely changes pattern at all. One example of this is port Roma in Texas. This means that using Calexico and El Paso to explain the 9/11 effect should not be treated as comprehensive, but as the dominant locations of the effect.

3 Scepticism & questions about the data

According to the Bureau of Transportation Statistics FAQ on border crossing data, there are no passenger trains between Mexico and the United States. For freight trains, crews are changed at the US-Mexico Border. This means there should be no non-zero entries for train passengers in our data. Howwever, there are are 1481 non-zero entries for train passengers, ranging from the years 1996 to 2023 with most of them being in California. This inconsistancy needs to be further investigated.

Most ports of entry have over 100 entries in our data set, but three ports have less than 100: Algonac, Anchorage, and Cross Border Xpress. For Cross Border Xpress, this makes sense since it was recently opened in 2015. Interesting, Anchorage only has one entry and is quite distant from the US-Canadian border in Alaska. It only lists empty shipping containers in September 2023.

Another concern with our data is the zero entries. We wonder if they are actually zeross or NA values. ASometimes ports are closed (especially seasonally during winter for the US-Canada Border) but we know that certain high-traffic ports with lots of commuters probably still had crossings during months where zeros are rerecorded. For example, during the year 1996, San Ysidro (a major commuting port on the US-Mexico border) has a personal vehicle passenger count of zero for every month. For the purposes of our analysis, we treated the zero entries as zeros and not as NA values.

4 Conclusion & future Work

Overall, through our analysis we gleamed that the US-Canada appears to be very cyclic, whereas the US-Mexico border appears to be very volatile and possibly subject to international events and some US foreign policy. The time series analysis showcases that major events had different but similar effects on both borders.

One possible venture for future work is distinguishing between commercial and private traffic. There are some issues with it possibly being confounded in the data set, such as passenger trains and freight trains both being counted as ‘trains’. However, we may be able to investigate the flow of imports and exports by comparing empty truck and rail containers (exports) with loaded truck and rail containers.